Xml Mining: from Trees to Strings

نویسندگان

  • S. Miniaoui
  • M. Wentland Forte
چکیده

XML is becoming this few years the standard of data exchange in the Web and a new data description language. Consequently, in a Data Mining context, optimizing storage and access time to XML documents is becoming a new challenge. Indeed, for mining XML documents we have to parse them in order to obtain a tree data structure in RAM memory. This tree structure is more flexible and have a beter time access and navigation than the textual format. Moreover, this tree representation of XML documents presents more semantic-richness than the textual one. Thus, for an eficient XML mining task, the XML mapping stage will conserve the semantic of data (hierarchie of concepts wthin the XML document) and generate a compact data structure. In this paper we present an overview of XML mapping propositions and we discuss advantages and drawbacks for an efficient mining. Then, we note relevent concepts to consider in the XML mapping for an efficient mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining XML Frequent Query Patterns

With XML being the standard for data encoding and exchange over Internet, how to find the interesting XML query characteristic efficiently becomes a critical issue. Mining frequent query pattern is a technique to discover the most frequently occurring query pattern trees from a large collection of XML queries. In this paper, we describe an efficient mining algorithm to discover the frequent que...

متن کامل

Towards Integrating Decision Tree with Xml Technologies

The paper proposes a method for efficiently store collections of multi-purpose decision trees within a native distributed XML database. The predictive information for building the XML decision trees is gathered through Web mining techniques and methodologies. In order to share data from heterogeneous sources, the model employs semantic Web languages to describe and represent data sources. The u...

متن کامل

XML Tree Finder System: a First Step towards XML Data Mining Final Report

The problem of searching frequent trees from a collection of tree-structured XML data modeling is considered. The aim of this XML Tree Finder system(XTFS) is to find the tree whose exact or perturbed copies are frequent in a collection of the labeled trees. The definition of the labeled tree will be given later.Frequent here means that the tree we find is the Maximal Common Tree of the collecti...

متن کامل

Balanced Context-Free Grammars, Hedge Grammars and Pushdown Caterpillar Automata

The XML community generally takes trees and hedges as the model for XML document instances and element content. In contrast, Berstel and Boasson have discussed XML documents in the framework of extended context-free grammar, modeling XML documents as Dyck strings and schemas as balanced grammars. How can these two models be brought closer together? We examine the close relatioship between Dyck ...

متن کامل

Mining XML-Enabled Association Rules with Templates

XML-enabled association rule framework [8] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association relationships inherent in XML data. Compared with traditional association mining in the well-structured world...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005